The main aim of this analysis, is to answer 5 of the most important questions that come to a young enterpreneurs mind when he decides to start his own company; Who is the most important investor in the country? How does the funding ecosystem change with time? Who is the biggest investor in the country? Do cities play a major role in funding? *Which industries are favored by investors for funding?
This dataset has been downloaded from kaggle. This dataset contains the funding information of the Indian startups from January 2015 to January 2020. It includes columns with the date funded, the city the startup is based out of, the names of the funders, and the amount invested (in USD). First step is to install the opendatasets module, which helps in downloading and storing the dataset to the jupyter notebook. From there we go to the directory in which the downloaded file has been stored.
!pip install jovian opendatasets --upgrade --quiet
Let's begin by downloading the data, and listing the files within the dataset.
dataset_url = 'https://www.kaggle.com/sudalairajkumar/indian-startup-funding'
import opendatasets as od
od.download(dataset_url)
Skipping, found downloaded files in "./indian-startup-funding" (use force=True to force download)
The dataset has been downloaded and extracted.
data_dir = './indian-startup-funding'
import os
os.listdir(data_dir)
['startup_funding.csv']
project_name = "indian-startup-funding-analysis-project-jovian-main"
!pip install jovian --upgrade -q
import jovian
jovian.commit(project=project_name)
[jovian] Updating notebook "adithyanovak2001/indian-startup-funding-analysis-project-jovian-main" on https://jovian.ai [jovian] Committed successfully! https://jovian.ai/adithyanovak2001/indian-startup-funding-analysis-project-jovian-main
'https://jovian.ai/adithyanovak2001/indian-startup-funding-analysis-project-jovian-main'
After we download the dataset and go to the directory that it is stored in, we use the read_csv function from the pandas library to convert it to a dataframe. Now we have to clean the corrupt data, change column name, handle missing data, parsing dates, change data types of the columns and various other operations.
import pandas as pd
startup_data_df=pd.read_csv(data_dir + "/startup_funding.csv")
startup_data_df
| Sr No | Date dd/mm/yyyy | Startup Name | Industry Vertical | SubVertical | City Location | Investors Name | InvestmentnType | Amount in USD | Remarks | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 09/01/2020 | BYJU’S | E-Tech | E-learning | Bengaluru | Tiger Global Management | Private Equity Round | 20,00,00,000 | NaN |
| 1 | 2 | 13/01/2020 | Shuttl | Transportation | App based shuttle service | Gurgaon | Susquehanna Growth Equity | Series C | 80,48,394 | NaN |
| 2 | 3 | 09/01/2020 | Mamaearth | E-commerce | Retailer of baby and toddler products | Bengaluru | Sequoia Capital India | Series B | 1,83,58,860 | NaN |
| 3 | 4 | 02/01/2020 | https://www.wealthbucket.in/ | FinTech | Online Investment | New Delhi | Vinod Khatumal | Pre-series A | 30,00,000 | NaN |
| 4 | 5 | 02/01/2020 | Fashor | Fashion and Apparel | Embroiled Clothes For Women | Mumbai | Sprout Venture Partners | Seed Round | 18,00,000 | NaN |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 3039 | 3040 | 29/01/2015 | Printvenue | NaN | NaN | NaN | Asia Pacific Internet Group | Private Equity | 45,00,000 | NaN |
| 3040 | 3041 | 29/01/2015 | Graphene | NaN | NaN | NaN | KARSEMVEN Fund | Private Equity | 8,25,000 | Govt backed VC Fund |
| 3041 | 3042 | 30/01/2015 | Mad Street Den | NaN | NaN | NaN | Exfinity Fund, GrowX Ventures. | Private Equity | 15,00,000 | NaN |
| 3042 | 3043 | 30/01/2015 | Simplotel | NaN | NaN | NaN | MakeMyTrip | Private Equity | NaN | Strategic Funding, Minority stake |
| 3043 | 3044 | 31/01/2015 | couponmachine.in | NaN | NaN | NaN | UK based Group of Angel Investors | Seed Funding | 1,40,000 | NaN |
3044 rows × 10 columns
startup_data_df=startup_data_df.rename(columns={"Sr No":"s_no"})
startup_data_df=startup_data_df.rename(columns={"Date dd/mm/yyyy":"Date"})
startup_data_df=startup_data_df.rename(columns={"Startup Name":"Startup_Name"})
startup_data_df=startup_data_df.rename(columns={"Industry Vertical":"Industry"})
startup_data_df=startup_data_df.rename(columns={"SubVertical":"Sub_Industry"})
startup_data_df=startup_data_df.rename(columns={"City Location":"Location"})
startup_data_df=startup_data_df.rename(columns={"Investors Name":"Investor"})
startup_data_df=startup_data_df.rename(columns={"InvestmentnType":"Investment_Type"})
startup_data_df=startup_data_df.rename(columns={"Amount in USD":"Amount_in_USD"})
startup_data_df.head(10)
| s_no | Date | Startup_Name | Industry | Sub_Industry | Location | Investor | Investment_Type | Amount_in_USD | Remarks | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 09/01/2020 | BYJU’S | E-Tech | E-learning | Bengaluru | Tiger Global Management | Private Equity Round | 20,00,00,000 | NaN |
| 1 | 2 | 13/01/2020 | Shuttl | Transportation | App based shuttle service | Gurgaon | Susquehanna Growth Equity | Series C | 80,48,394 | NaN |
| 2 | 3 | 09/01/2020 | Mamaearth | E-commerce | Retailer of baby and toddler products | Bengaluru | Sequoia Capital India | Series B | 1,83,58,860 | NaN |
| 3 | 4 | 02/01/2020 | https://www.wealthbucket.in/ | FinTech | Online Investment | New Delhi | Vinod Khatumal | Pre-series A | 30,00,000 | NaN |
| 4 | 5 | 02/01/2020 | Fashor | Fashion and Apparel | Embroiled Clothes For Women | Mumbai | Sprout Venture Partners | Seed Round | 18,00,000 | NaN |
| 5 | 6 | 13/01/2020 | Pando | Logistics | Open-market, freight management platform | Chennai | Chiratae Ventures | Series A | 90,00,000 | NaN |
| 6 | 7 | 10/01/2020 | Zomato | Hospitality | Online Food Delivery Platform | Gurgaon | Ant Financial | Private Equity Round | 15,00,00,000 | NaN |
| 7 | 8 | 12/12/2019 | Ecozen | Technology | Agritech | Pune | Sathguru Catalyzer Advisors | Series A | 60,00,000 | NaN |
| 8 | 9 | 06/12/2019 | CarDekho | E-Commerce | Automobile | Gurgaon | Ping An Global Voyager Fund | Series D | 7,00,00,000 | NaN |
| 9 | 10 | 03/12/2019 | Dhruva Space | Aerospace | Satellite Communication | Bengaluru | Mumbai Angels, Ravikanth Reddy | Seed | 5,00,00,000 | NaN |
startup_data_df['Date'] = pd.to_datetime(startup_data_df['Date'], errors='coerce')
startup_data_df=startup_data_df.dropna(subset=['Date'])
startup_data_df['Date'] = pd.to_datetime(startup_data_df['Date'])
startup_data_df = startup_data_df.sort_values(by="Date")
startup_data_df.reset_index(inplace=True)
del startup_data_df['index']
startup_data_df['s_no'] = startup_data_df.reset_index().index+1
startup_data_df
| s_no | Date | Startup_Name | Industry | Sub_Industry | Location | Investor | Investment_Type | Amount_in_USD | Remarks | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 2015-01-05 | Termsheet | Fund Raising Platform | NaN | Chennai | Anand Vijay, Nipun Dureja, Satyajit Heeralal, ... | Seed Funding | 1,00,000 | NaN |
| 1 | 2 | 2015-01-05 | Foodpanda | Online Food Delivery | NaN | Gurgaon | Goldman Sachs, Rocket Internet | Private Equity | 10,00,00,000 | Series D |
| 2 | 3 | 2015-01-06 | Proviera | Probiotic Technology Products Manufacturer | NaN | Chennai | Infuse Ventures | Private Equity | 5,50,000 | NaN |
| 3 | 4 | 2015-01-06 | Arth DesignBuild | Architectural Design & Consulting | NaN | Hyderabad | Srinivas Tirupati | Seed Funding | 5,00,000 | NaN |
| 4 | 5 | 2015-01-06 | Glamrs | Online Fashion Video Portal | NaN | Mumbai | Ventureworks India, Blume Ventures, Batlivala ... | Private Equity | 10,00,000 | Pre-Series A |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 3035 | 3036 | 2020-02-01 | Fashor | Fashion and Apparel | Embroiled Clothes For Women | Mumbai | Sprout Venture Partners | Seed Round | 18,00,000 | NaN |
| 3036 | 3037 | 2020-02-01 | https://www.wealthbucket.in/ | FinTech | Online Investment | New Delhi | Vinod Khatumal | Pre-series A | 30,00,000 | NaN |
| 3037 | 3038 | 2020-09-01 | Mamaearth | E-commerce | Retailer of baby and toddler products | Bengaluru | Sequoia Capital India | Series B | 1,83,58,860 | NaN |
| 3038 | 3039 | 2020-09-01 | BYJU’S | E-Tech | E-learning | Bengaluru | Tiger Global Management | Private Equity Round | 20,00,00,000 | NaN |
| 3039 | 3040 | 2020-10-01 | Zomato | Hospitality | Online Food Delivery Platform | Gurgaon | Ant Financial | Private Equity Round | 15,00,00,000 | NaN |
3040 rows × 10 columns
startup_data_df.drop(459, inplace=True)
startup_data_df.drop(466, inplace=True)
startup_data_df.drop(619, inplace=True)
startup_data_df.drop(620, inplace=True)
startup_data_df.drop(460, inplace=True)
startup_data_df.drop(461, inplace=True)
startup_data_df.drop(617, inplace=True)
startup_data_df.drop(618, inplace=True)
startup_data_df.drop(704, inplace=True)
startup_data_df.drop(705, inplace=True)
startup_data_df.drop(2898, inplace=True)
startup_data_df.drop(2922, inplace=True)
startup_data_df.drop(2984, inplace=True)
startup_data_df.drop(2963, inplace=True)
startup_data_df.drop(2964, inplace=True)
startup_data_df.drop(2971, inplace=True)
startup_data_df.drop(3014, inplace=True)
startup_data_df.drop(3003, inplace=True)
startup_data_df.drop(465, inplace=True)
startup_data_df['Amount_in_USD'] = startup_data_df['Amount_in_USD'].str.replace(',', '')
def correct_name(x,y):
if(len(y[x.str.contains(r'\\\\xc2\\\\xa0')==True])>0):
x=x.str.replace(r'\\\\xc2\\\\xa0',"")
if(len(y[x.str.contains(r'https://www.wealthbucket.in/')==True])>0):
x=x.str.replace(r'https://www.wealthbucket.in/',"wealthbucket")
if(len(y[x.str.contains(r'\\\\xe2\\\\x80\\\\x99')==True])>0):
x=x.str.replace(r'\\\\xe2\\\\x80\\\\x99',"")
if(len(y[x.str.contains(r'\\xe2\\x80\\x99')]==True)>0):
x=x.str.replace(r'\\xe2\\x80\\x99','')
return x
startup_data_df['Startup_Name']=correct_name(startup_data_df['Startup_Name'],startup_data_df)
def correct_city(x,y):
if(len(y[x.str.contains(r'\\\\xc2\\\\xa0')==True])>0):
x=x.str.replace(r'\\\\xc2\\\\xa0',"")
if(len(y[x.str.contains(r'\\\\\\\\xc2\\\\\\\\xa0')==True])>0):
x=x.str.replace(r'\\\\\\\\xc2\\\\\\\\xa0',"")
#Bengaluru
if(len(y[x.str.contains('Bangalore')==True])>0):
x=x.str.replace('Bangalore',"Bengaluru")
if(len(y[x.str.contains('Bangalore')==True])>0):
x=x.str.replace('Bangalore',"Bengaluru")
if(len(y[x.str.contains(r'Bengaluru and Gurugram')==True])>0):
x=x.str.replace(r'Bengaluru and Gurugram','Bengaluru')
if(len(y[x.str.contains(r'Bengaluru/ Bangkok')==True])>0):
x=x.str.replace(r'Bengluru/ Bangkok','Bengaluru')
if(len(y[x.str.contains(r'Bengaluru / SFO')==True])>0):
x=x.str.replace(r'Bengaluru / SFO','Bengaluru')
if(len(y[x.str.contains(r'SFO / Bengaluru')==True])>0):
x=x.str.replace(r'SFO / Bengaluru','Bengaluru')
if(len(y[x.str.contains(r'Bengaluru/ Bangkok')==True])>0):
x=x.str.replace(r'Bengaluru/ Bangkok','Bengaluru')
if(len(y[x.str.contains(r'New York, Bengaluru')==True])>0):
x=x.str.replace(r'New York, Bengaluru','Bengaluru')
if(len(y[x.str.contains(r'Bengaluru / Palo Alto')==True])>0):
x=x.str.replace(r'Bengaluru / Palo Alto','Bengaluru')
if(len(y[x.str.contains(r'Bengaluru / San Mateo')==True])>0):
x=x.str.replace(r'Bengaluru / San Mateo','Bengaluru')
if(len(y[x.str.contains(r'Seattle / Bengaluru')==True])>0):
x=x.str.replace(r'Seattle / Bengaluru','Bengaluru')
if(len(y[x.str.contains(r'Bengaluru / USA')==True])>0):
x=x.str.replace(r'Bengaluru / USA','Bengaluru')
#Gurgaon
if(len(y[x.str.contains("Gurgaon")==True])>0):
x=x.str.replace('Gurgaon','Gurugram')
if(len(y[x.str.contains(r'Gurugram / SFO')==True])>0):
x=x.str.replace(r'Gurugram / SFO','Bengaluru')
#Delhi
if(len(y[x.str.contains(r'Nw Delhi')==True])>0):
x=x.str.replace('Nw Delhi','Delhi')
if(len(y[x.str.contains(r'Delhi / California')==True])>0):
x=x.str.replace('Delhi / California','Delhi')
if(len(y[x.str.contains(r'Delhi / Houston')==True])>0):
x=x.str.replace(r'Delhi / Houston','Delhi')
if(len(y[x.str.contains(r'New Delhi')==True])>0):
x=x.str.replace(r'New Delhi','Delhi')
if(len(y[x.str.contains(r'Delhi / US')==True])>0):
x=x.str.replace(r'Delhi / US','Delhi')
if(len(y[x.str.contains(r'Delhi & Cambridge')==True])>0):
x=x.str.replace(r'Delhi & Cambridge','Delhi')
if(len(y[x.str.contains(r'Delhi/ Houston')==True])>0):
x=x.str.replace(r'Delhi/ Houston','Delhi')
#Goa
if(len(y[x.str.contains(r'Goa/Hyderabad')==True])>0):
x=x.str.replace(r'Goa/Hyderabad','Goa')
if(len(y[x.str.contains(r'Panaji')==True])>0):
x=x.str.replace(r'Panaji','Goa')
#Mumbai
if(len(y[x.str.contains(r'Mumbai / Global')==True])>0):
x=x.str.replace(r'Mumbai / Global','Mumbai')
if(len(y[x.str.contains(r'Andheri')==True])>0):
x=x.str.replace(r'Andheri','Mumbai')
if(len(y[x.str.contains(r'Mumbai/Bengaluru')==True])>0):
x=x.str.replace(r'Mumbai/Bengaluru','Mumbai')
if(len(y[x.str.contains(r'Mumbai / UK')==True])>0):
x=x.str.replace(r'Mumbai / UK','Mumbai')
if(len(y[x.str.contains(r'Mumbai / NY')==True])>0):
x=x.str.replace(r'Mumbai / NY','Mumbai')
#Pune
if(len(y[x.str.contains(r'Pune / Singapore')==True])>0):
x=x.str.replace(r'Pune / Singapore','Pune')
if(len(y[x.str.contains(r'Pune / US')==True])>0):
x=x.str.replace(r'Pune / US','Pune')
if(len(y[x.str.contains(r'Pune / Dubai')==True])>0):
x=x.str.replace(r'Pune / Dubai','Pune')
if(len(y[x.str.contains(r'Pune/Seattle')==True])>0):
x=x.str.replace(r'Pune/Seattle','Pune')
#Hyderabad
if(len(y[x.str.contains(r'Dallas / Hyderabad')==True])>0):
x=x.str.replace(r'Dallas / Hyderabad','Hyderabad')
if(len(y[x.str.contains(r'Hyderabad/USA')==True])>0):
x=x.str.replace(r'Hyderabad/USA','Hyderabad')
#Chennai
if(len(y[x.str.contains(r'Chennai/ Singapore')==True])>0):
x=x.str.replace(r'Chennai/ Singapore','Chennai')
#Noida
if(len(y[x.str.contains(r'Noida / Singapore')==True])>0):
x=x.str.replace(r'Noida / Singapore','Noida')
return x
startup_data_df['Location']=correct_city(startup_data_df['Location'],startup_data_df)
def correct_investor(x,y):
if(len(y[x.str.contains(r'\\\\xc2\\\\xa0')==True])>0):
x=x.str.replace(r'\\\\xc2\\\\xa0',"")
if(len(y[x.str.contains(r'\\\\\\\\xc2\\\\\\\\xa0')==True])>0):
x=x.str.replace(r'\\\\\\\\xc2\\\\\\\\xa0',"")
if(len(y[x.str.contains(r'\\\\xe2\\\\x80\\\\x99O')==True])>0):
x=x.str.replace(r'\\\\xe2\\\\x80\\\\x99O',"")
if(len(y[x.str.contains(r'\\\\\\\\xe2\\\\\\\\x80\\\\\\\\x99O')==True])>0):
x=x.str.replace(r'\\\\\\\\xe2\\\\\\\\x80\\\\\\\\x99O',"")
if(len(y[x.str.contains(r'\\\xe2\\\\x80\\\\x99')==True])>0):
x=x.str.replace(r'\\\xe2\\\\x80\\\\x99',"")
if(len(y[x.str.contains(r'\\\\xe2\\\\x80\\\\x99')==True])>0):
x=x.str.replace(r'\\\\xe2\\\\x80\\\\x99',"")
if(len(y[x.str.contains(r'\\\\\\\\xc3\\\\\\\\x98')==True])>0):
x=x.str.replace(r'\\\\\\\\xc3\\\\\\\\x98',"")
if(len(y[x.str.contains(r'\\\\\\\\n\\\\\\\\n')==True])>0):
x=x.str.replace(r'\\\\\\\\n\\\\\\\\n',"")
if(len(y[x.str.contains(r'Undisclosed Investor')==True])>0):
x=x.str.replace(r'Undisclosed Investor',"Undisclosed Investors")
if(len(y[x.str.contains(r'Undisclosed investor')==True])>0):
x=x.str.replace(r'Undisclosed investor',"Undisclosed Investors")
if(len(y[x.str.contains(r'Undisclosed Investorss')==True])>0):
x=x.str.replace(r'Undisclosed Investorss',"Undisclosed Investors")
if(len(y[x.str.contains(r'Undisclosed HNIs')==True])>0):
x=x.str.replace(r'Undisclosed HNIs',"Undisclosed Investors")
if(len(y[x.str.contains(r'Undisclosed Angel investors & HNIs')==True])>0):
x=x.str.replace(r"Undisclosed Angel investors & HNIs","Undisclosed Investors")
if(len(y[x.str.contains(r'undisclosed investors')==True])>0):
x=x.str.replace(r"undisclosed investors","Undisclosed Investors")
if(len(y[x.str.contains(r'undisclosed investor')==True])>0):
x=x.str.replace(r"undisclosed investor","Undisclosed Investors")
if(len(y[x.str.contains(r'Undisclosed angel investors')==True])>0):
x=x.str.replace(r"Undisclosed angel investors","Undisclosed Investors")
if(len(y[x.str.contains(r'High Networth Individuals (undisclosed)')==True])>0):
x=x.str.replace(r"High Networth Individuals (undisclosed)","Undisclosed Investors")
if(len(y[x.str.contains(r'undisclosed private investors')==True])>0):
x=x.str.replace(r"undisclosed private investors","Undisclosed Investors")
if(len(y[x.str.contains(r'SoftBank Group')==True])>0):
x=x.str.replace(r"SoftBank Group","SoftBank")
if(len(y[x.str.contains(r'Softbank')==True])>0):
x=x.str.replace(r"Softbank","SoftBank")
return x
startup_data_df['Investor']=correct_investor(startup_data_df['Investor'],startup_data_df)
/opt/conda/lib/python3.8/site-packages/pandas/core/strings.py:2001: UserWarning: This pattern has match groups. To actually get the groups, use str.extract. return func(self, *args, **kwargs)
def correct_industry(x,y):
if(len(y[x.str.contains(r'\\\\xc2\\\\xa0')==True])>0):
x=x.str.replace(r'\\\\xc2\\\\xa0',"")
if(len(y[x.str.contains(r'\\\\xc3\\\\xa9')==True])>0):
x=x.str.replace(r'\\\\xc3\\\\xa9',"")
if(len(y[x.str.contains(r'\\\\xe2\\\\x80\\\\x99s')==True])>0):
x=x.str.replace(r'\\\\xe2\\\\x80\\\\x99',"")
if(len(y[x.str.contains(r'\\xe2\\x80\\x93')==True])>0):
x=x.str.replace(r'\\xe2\\x80\\x93',"")
if(len(y[x.str.contains(r'\\\\n')==True])>0):
x=x.str.replace(r'\\\\n'," ")
if(len(y[x.str.contains(r'EdTech')==True])>0):
x=x.str.replace(r'EdTech',"Ed-Tech")
if(len(y[x.str.contains(r'Edtech')==True])>0):
x=x.str.replace(r'Edtech',"Ed-Tech")
if(len(y[x.str.contains(r'Ecommerce')==True])>0):
x=x.str.replace(r'Ecommerce',"E-commerce")
if(len(y[x.str.contains(r'eCommerce')==True])>0):
x=x.str.replace(r'eCommerce',"E-commerce")
if(len(y[x.str.contains(r'ecommerce')==True])>0):
x=x.str.replace(r'ecommerce',"E-commerce")
if(len(y[x.str.contains(r'ECommerce')==True])>0):
x=x.str.replace(r'ECommerce',"E-commerce")
if(len(y[x.str.contains(r'E-Commerce')==True])>0):
x=x.str.replace(r'E-Commerce',"E-commerce")
if(len(y[x.str.contains(r'Fintech')==True])>0):
x=x.str.replace(r'Fintech',"FinTech")
if(len(y[x.str.contains(r'Fin-Tech')==True])>0):
x=x.str.replace(r'Fin-Tech',"FinTech")
if(len(y[x.str.contains(r'healthcare')==True])>0):
x=x.str.replace(r'healthcare',"Healthcare")
if(len(y[x.str.contains(r'SAAS')==True])>0):
x=x.str.replace(r'SAAS',"SaaS")
if(len(y[x.str.contains(r'Saas')==True])>0):
x=x.str.replace(r'Saas',"SaaS")
if(len(y[x.str.contains(r'logistics')==True])>0):
x=x.str.replace(r'logistics',"Logistics")
if(len(y[x.str.contains(r'OnDemand')==True])>0):
x=x.str.replace(r'OnDemand',"OnDemand")
if(len(y[x.str.contains(r'on-demand')==True])>0):
x=x.str.replace(r'on-demand',"OnDemand")
if(len(y[x.str.contains(r'online')==True])>0):
x=x.str.replace(r'online',"Online")
return x
startup_data_df['Industry']=correct_industry(startup_data_df['Industry'],startup_data_df)
def correct_funding(x,y):
#Seed-Angel Funding
if(len(y[x.str.contains(r'Seed/ Angel Funding')==True])>0):
x=x.str.replace(r'Seed/ Angel Funding',"Seed Angel Funding")
if(len(y[x.str.contains(r'Angel / Seed Funding')==True])>0):
x=x.str.replace(r'Angel / Seed Funding',"Seed Funding")
if(len(y[x.str.contains(r'Seed / Angel Funding')==True])>0):
x=x.str.replace(r'Seed / Angel Funding',"Seed Angel Funding")
if(len(y[x.str.contains(r'Seed / Angle Funding')==True])>0):
x=x.str.replace(r'Seed / Angle Funding',"Seed Funding")
if(len(y[x.str.contains(r'Seed/Angel Funding')==True])>0):
x=x.str.replace(r'Seed/Angel Funding',"Seed Funding")
#Seed Funding
if(len(y[x.str.contains(r'Seed\\\\nFunding')==True])>0):
x=x.str.replace(r'Seed\\\\nFunding',"Seed Funding")
if(len(y[x.str.contains(r'Seed\\nFunding')==True])>0):
x=x.str.replace(r'Seed\\nFunding',"Seed Funding")
if(len(y[x.str.contains(r'Seed Round')==True])>0):
x=x.str.replace(r'Seed Round',"Seed Funding")
if(len(y[x.str.contains(r'Seed funding')==True])>0):
x=x.str.replace(r'Seed funding',"Seed Funding")
if(len(y[x.str.contains(r'Seed Funding Round')==True])>0):
x=x.str.replace(r'Seed Funding Round',"Seed Funding")
#Pre-Series A
if(len(y[x.str.contains(r'pre-series A')==True])>0):
x=x.str.replace(r'pre-series A',"Pre-Series A")
if(len(y[x.str.contains(r'Pre Series A')==True])>0):
x=x.str.replace(r'Pre Series A',"Pre-Series A")
if(len(y[x.str.contains(r'Pre-series A')==True])>0):
x=x.str.replace(r'Pre-series A',"Pre-Series A")
if(len(y[x.str.contains(r'pre-Series A')==True])>0):
x=x.str.replace(r'pre-Series A',"Pre-Series A")
#Equity
if(len(y[x.str.contains(r'Equity Based Funding')==True])>0):
x=x.str.replace(r'Equity Based Funding',"Equity")
#Angel
if(len(y[x.str.contains(r'Angel Funding')==True])>0):
x=x.str.replace(r'Angel Funding',"Angel")
if(len(y[x.str.contains(r'Angel Round')==True])>0):
x=x.str.replace(r'Angel Round',"Angel")
#Private Equity
if(len(y[x.str.contains(r'PrivateEquity')==True])>0):
x=x.str.replace(r'PrivateEquity',"Private Equity")
if(len(y[x.str.contains(r'Private Funding')==True])>0):
x=x.str.replace(r'Private Funding',"Private Equity")
if(len(y[x.str.contains(r'Private\\\\nEquity')==True])>0):
x=x.str.replace(r'Private\\\\nEquity',"Private Equity")
if(len(y[x.str.contains(r'Private Equity Round')==True])>0):
x=x.str.replace(r'Private Equity Round',"Private Equity")
return x
startup_data_df['Investment_Type']=correct_funding(startup_data_df['Investment_Type'],startup_data_df)
startup_data_df.Amount_in_USD = startup_data_df.Amount_in_USD.astype(float)
startup_data_df
| s_no | Date | Startup_Name | Industry | Sub_Industry | Location | Investor | Investment_Type | Amount_in_USD | Remarks | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 2015-01-05 | Termsheet | Fund Raising Platform | NaN | Chennai | Anand Vijay, Nipun Dureja, Satyajit Heeralal, ... | Seed Funding | 100000.0 | NaN |
| 1 | 2 | 2015-01-05 | Foodpanda | Online Food Delivery | NaN | Gurugram | Goldman Sachs, Rocket Internet | Private Equity | 100000000.0 | Series D |
| 2 | 3 | 2015-01-06 | Proviera | Probiotic Technology Products Manufacturer | NaN | Chennai | Infuse Ventures | Private Equity | 550000.0 | NaN |
| 3 | 4 | 2015-01-06 | Arth DesignBuild | Architectural Design & Consulting | NaN | Hyderabad | Srinivas Tirupati | Seed Funding | 500000.0 | NaN |
| 4 | 5 | 2015-01-06 | Glamrs | Online Fashion Video Portal | NaN | Mumbai | Ventureworks India, Blume Ventures, Batlivala ... | Private Equity | 1000000.0 | Pre-Series A |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 3035 | 3036 | 2020-02-01 | Fashor | Fashion and Apparel | Embroiled Clothes For Women | Mumbai | Sprout Venture Partners | Seed Funding | 1800000.0 | NaN |
| 3036 | 3037 | 2020-02-01 | wealthbucket | FinTech | Online Investment | Delhi | Vinod Khatumal | Pre-Series A | 3000000.0 | NaN |
| 3037 | 3038 | 2020-09-01 | Mamaearth | E-commerce | Retailer of baby and toddler products | Bengaluru | Sequoia Capital India | Series B | 18358860.0 | NaN |
| 3038 | 3039 | 2020-09-01 | BYJU’S | E-Tech | E-learning | Bengaluru | Tiger Global Management | Private Equity | 200000000.0 | NaN |
| 3039 | 3040 | 2020-10-01 | Zomato | Hospitality | Online Food Delivery Platform | Gurugram | Ant Financial | Private Equity | 150000000.0 | NaN |
3021 rows × 10 columns
startup_data_df.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 3021 entries, 0 to 3039 Data columns (total 10 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 s_no 3021 non-null int64 1 Date 3021 non-null datetime64[ns] 2 Startup_Name 3021 non-null object 3 Industry 2851 non-null object 4 Sub_Industry 2099 non-null object 5 Location 2842 non-null object 6 Investor 2997 non-null object 7 Investment_Type 3017 non-null object 8 Amount_in_USD 2061 non-null float64 9 Remarks 414 non-null object dtypes: datetime64[ns](1), float64(1), int64(1), object(7) memory usage: 259.6+ KB
startup_data_df.describe()
| s_no | Amount_in_USD | |
|---|---|---|
| count | 3021.000000 | 2.061000e+03 |
| mean | 1520.154916 | 1.845738e+07 |
| std | 875.402679 | 1.214895e+08 |
| min | 1.000000 | 1.600000e+04 |
| 25% | 767.000000 | 4.700000e+05 |
| 50% | 1522.000000 | 1.700000e+06 |
| 75% | 2277.000000 | 8.000000e+06 |
| max | 3040.000000 | 3.900000e+09 |
Now we come to the analysis part of the project. Here we are going to be plotting various graphs to see how thr trends in the trends in the startup funding ecosystem are changing.
Let's begin by importingmatplotlib.pyplot and seaborn.
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib
import numpy as np
%matplotlib inline
#sns.set_style('darkgrid')
matplotlib.rcParams['font.size'] = 14
#matplotlib.rcParams['figure.figsize'] = (10, 6)
matplotlib.rcParams['figure.facecolor'] = '#00000000'
!pip install plotly==4.14.3
import plotly.graph_objects as go
Collecting plotly==4.14.3
Downloading plotly-4.14.3-py2.py3-none-any.whl (13.2 MB)
|████████████████████████████████| 13.2 MB 7.1 MB/s eta 0:00:01
Collecting retrying>=1.3.3
Downloading retrying-1.3.3.tar.gz (10 kB)
Requirement already satisfied: six in /opt/conda/lib/python3.8/site-packages (from plotly==4.14.3) (1.15.0)
Building wheels for collected packages: retrying
Building wheel for retrying (setup.py) ... done
Created wheel for retrying: filename=retrying-1.3.3-py3-none-any.whl size=11429 sha256=8c41a89c898914c9ced0b38c423241752f24939b0710e496b0275aa51fced20d
Stored in directory: /home/jovyan/.cache/pip/wheels/c4/a7/48/0a434133f6d56e878ca511c0e6c38326907c0792f67b476e56
Successfully built retrying
Installing collected packages: retrying, plotly
Successfully installed plotly-4.14.3 retrying-1.3.3
This graph explains how the number of startups in the country have been increasing since 2015.
plt.plot(startup_data_df.Date,startup_data_df.s_no)
plt.xlabel('Year')
plt.ylabel('No. of Startups')
plt.title('Number of Startups in the country over the years')
matplotlib.rcParams['figure.figsize'] = (8, 4)
This graph with the years on the x axis and the number of startups on the y axis helps us infer how the number of startups in the country have been increasing with time. Looking at this graph should make you brave enough to start your dream company if you are an aspiring enterpreneur but dont have the courage to do it.
sns.countplot(startup_data_df['Date'].dt.year)
plt.xlabel('Year')
plt.ylabel('No. of Startups Funded')
plt.title('No of Startups Funded Over the Years')
matplotlib.rcParams['figure.figsize'] = (10, 8)
/opt/conda/lib/python3.8/site-packages/seaborn/_decorators.py:36: FutureWarning: Pass the following variable as a keyword arg: x. From version 0.12, the only valid positional argument will be `data`, and passing other arguments without an explicit keyword will result in an error or misinterpretation. warnings.warn(
This bar graph tells us the number of startups that have emerged every year for the last 5 years, with 2016 being the most and 2020 being the least. Point to note here is that the number of startups in 2020 is the least because there are records only for the first month.
invest_amount=list(startup_data_df.groupby(startup_data_df['Investment_Type']).sum()['Amount_in_USD'])[:10]
typeof_invest = startup_data_df['Investment_Type'].value_counts()[:10].index
print(invest_amount)
print(typeof_invest)
plt.bar(typeof_invest,invest_amount)
plt.xlabel('Type of Investment')
plt.ylabel('Investment Amount in Billions')
plt.title('Type of investment vs Amount invested')
matplotlib.rcParams['figure.figsize'] = (22, 10)
[464605.0, 38080000.0, 30768.0, 125000.0, 150920354.0, 6320820.0, 5000000.0, 132000000.0, 1000000000.0, 2443495.0]
Index(['Seed Funding', 'Private Equity', 'Seed Angel', 'Debt Funding',
'Series A', 'Series B', 'Series C', 'Series D', 'Pre-Series A', 'Seed'],
dtype='object')
This bar graph explains how the investors want to invest their money. Pre series A is typically the time when the founders of the company are first getting their operations off the ground. The graph makes it clear that the investors want to invest in companies at the early stages of operation.
x = startup_data_df['Location'].value_counts()[:10].index
y = startup_data_df['Location'].value_counts()[:10].values
plt.bar(x,y)
matplotlib.rcParams['figure.figsize'] = (15, 6)
plt.title('No. of startups in different Cities')
plt.xlabel('City')
plt.ylabel('No. of Startups')
Text(0, 0.5, 'No. of Startups')
This graph gives the number of startups in the country city wise. Bangalore has the highest number of startups in the country followed by Mumbai. The graph clearly tells us why, Bangalore is called the Silicon Valley State of India.
fig1 = go.Figure(
data=go.Pie(values=startup_data_df['Industry'].value_counts()[:10].values,labels=startup_data_df['Industry'].value_counts()[:10].index))
fig1.show()
This pie chart gives the percentage of startups in every sector.
Here we are going to ask 5 of the most important question with regard to funding for startups and project our findings from the analysis.
1)Who is the most important investor in the country? 2)How does the funding ecosystem change with time? 3)Who is the biggest investor in the country? 4)Do cities play a major role in funding? 5)Which industries are favored by investors for funding?
1)Who is the most important investor in the country?
a = startup_data_df['Investor'].value_counts()[1:10].index
b = startup_data_df['Investor'].value_counts()[1:10].values
plt.bar(a,b)
plt.xlabel('Investors')
plt.ylabel('No. of startups funded')
plt.title('Most Important Investors in the Indian Ecosystem')
matplotlib.rcParams['figure.figsize'] = (27, 10)
Ratan Tata clearly is the most important investor in the indian startup funding ecosystem. As successful as he his, he has boosted many startups with his investments.
2)How does the funding ecosystem change with time?
amount=list(startup_data_df.groupby(startup_data_df['Date'].dt.year).sum()['Amount_in_USD'])
year_val=list(startup_data_df['Date'].dt.year.value_counts().index.sort_values())
sns.scatterplot(year_val,amount)
plt.plot(year_val,amount)
plt.xlabel('Year')
plt.ylabel('Amount(USD) in billions')
plt.title('Amount Funded over the Years')
matplotlib.rcParams['figure.figsize'] = (6, 5)
/opt/conda/lib/python3.8/site-packages/seaborn/_decorators.py:36: FutureWarning: Pass the following variables as keyword args: x, y. From version 0.12, the only valid positional argument will be `data`, and passing other arguments without an explicit keyword will result in an error or misinterpretation.
The amount funded every year has been increasing and decreasing every alternate year, with 202 being the lowest because we have records only for the first month.
3)Who is the biggest investor in the country?
p = startup_data_df.groupby('Investor')['Amount_in_USD'].sum().sort_values(ascending=False)[:5].index
q = startup_data_df.groupby('Investor')['Amount_in_USD'].sum().sort_values(ascending=False)[:5].values
plt.bar(p,q)
plt.xlabel('Investors')
plt.ylabel('Amount invested in Billions')
plt.title('Biggest Investors in the Indian Startup Ecosystem')
matplotlib.rcParams['figure.figsize'] = (27, 6)
The graph makes it clear that SoftBank has been the biggest investor in India.
4)Do cities play a major role in funding?
r = startup_data_df.groupby('Location')['Amount_in_USD'].sum().sort_values(ascending=False)[:7].index
s = startup_data_df.groupby('Location')['Amount_in_USD'].sum().sort_values(ascending=False)[:7].values
plt.bar(r,s)
plt.xlabel('City')
plt.ylabel('Investment Amount in billions')
plt.title('Amount invested by City')
Text(0.5, 1.0, 'Amount invested by City')
Bangalore being called the Silicon Valley State on India has got the mosst funding in the country, which also explains why it has the highest number of startups as well.
5)Which industries are favored by investors for funding?
g = startup_data_df.groupby('Industry')['Amount_in_USD'].sum().sort_values(ascending=False)[:10].index
h = startup_data_df.groupby('Industry')['Amount_in_USD'].sum().sort_values(ascending=False)[:10].values
plt.bar(g,h)
plt.xlabel('Industry')
plt.ylabel('Investment Amount in billions')
plt.title('Amount invested by Industry')
matplotlib.rcParams['figure.figsize'] = (45, 20)
The E-commerce industry has got the highest funding amongst all the sectors. E-commerce is also one of the popular industries for startups.
jovian.commit()
[jovian] Updating notebook "adithyanovak2001/indian-startup-funding-analysis-project-jovian-main" on https://jovian.ai [jovian] Committed successfully! https://jovian.ai/adithyanovak2001/indian-startup-funding-analysis-project-jovian-main
'https://jovian.ai/adithyanovak2001/indian-startup-funding-analysis-project-jovian-main'